Goto

Collaborating Authors

 sequence modeling problem


Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Neural Information Processing Systems

Large sequence models (SM) such as GPT series and BERT have displayed outstanding performance and generalization capabilities in natural language process, vision and recently reinforcement learning. A natural follow-up question is how to abstract multi-agent decision making also as an sequence modeling problem and benefit from the prosperous development of the SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the objective is to map agents' observation sequences to agents' optimal action sequences. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trial and error from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO.


Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Neural Information Processing Systems

Large sequence models (SM) such as GPT series and BERT have displayed outstanding performance and generalization capabilities in natural language process, vision and recently reinforcement learning. A natural follow-up question is how to abstract multi-agent decision making also as an sequence modeling problem and benefit from the prosperous development of the SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the objective is to map agents' observation sequences to agents' optimal action sequences. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee.


How Crucial is Transformer in Decision Transformer?

Siebenborn, Max, Belousov, Boris, Huang, Junning, Peters, Jan

arXiv.org Artificial Intelligence

Decision Transformer (DT) is a recently proposed architecture for Reinforcement Learning that frames the decision-making process as an auto-regressive sequence modeling problem and uses a Transformer model to predict the next action in a sequence of states, actions, and rewards. In this paper, we analyze how crucial the Transformer model is in the complete DT architecture on continuous control tasks. Namely, we replace the Transformer by an LSTM model while keeping the other parts unchanged to obtain what we call a Decision LSTM model. We compare it to DT on continuous control tasks, including pendulum swing-up and stabilization, in simulation and on physical hardware. Our experiments show that DT struggles with continuous control problems, such as inverted pendulum and Furuta pendulum stabilization. On the other hand, the proposed Decision LSTM is able to achieve expert-level performance on these tasks, in addition to learning a swing-up controller on the real system. These results suggest that the strength of the Decision Transformer for continuous control tasks may lie in the overall sequential modeling architecture and not in the Transformer per se.


Introduction to Sequence Modeling Problems

#artificialintelligence

Consider the problem of predicting the health risk of a person based on multiple health parameters and we have decided to model the true relationship between the input and output using Feed-forward neural networks (also known as Multi-layered Network of Neurons). In Feed-forward Neural Networks (FNN) the output of one data point is completely independent of the previous input i.e… the health risk of the second person is not dependent on the health risk of the first person and so on. Similarly, in the case of Convolution Neural Networks (CNN), the output from the softmax layer in the context of image classification is entirely independent of the previous input image. Citation Note: The content and the structure of this article is based my understand of the deep learning lectures from One-Fourth Labs -- PadhAI. Sequence Modeling is the task of predicting what word/letter comes next. Unlike the FNN and CNN, in sequence modeling, the current output is dependent on the previous input and the length of the input is not fixed.


Introduction to Sequence Modeling Problems - WebSystemer.no

#artificialintelligence

In sequence learning problems, we know that the true output at timestep't' is dependent on all the inputs that the model has seen up to the time step't'. Since we don't know the true relationship, we need to come up with an approximation such that the function would depend on all the previous inputs. The key thing to note here is that the task is not changing for every timestep, whether we are predicting the next character or tagging the part of speech of a word. The input to the function is changing at every time step because for longer sentences the function needs to keep track of the larger set of words. Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step.